Zijie Zhao
Yunqing Tang
Chin Ming Wong
Peng Yang
Colab Link: https://drive.google.com/file/d/1n9nFeK_VoV1K7aYEP7dWDcOBmnlZ0Pc4/view?usp=sharing
Overview
Have you ever seen a photo of a person who looks extremely familiar, and the name is on the tip of your tongue, but you just can't quite get it? Well, if you're bad with names like me, this happens frequently especially for celebrities. That's why we thought it would be useful to develop a model that can tell you the name. To make the project more feasible, we decided to limit the pool of celebrities to our 10 favourite cast members in the Marvel Avengers series.
The idea is to take an image of a face and run it through a model that will tell you if the person is actually part of the Avengers cast. If so, then a second model will give the name of the Avengers celebrity. The overall pipeline is shown in the figure below:
There are two models to train in this pipeline, so we will work on each one separately. However, we will preprocess the data used for both models all at once. The models will leverage transfer learning by using pretrained models in PyTorch, but the fully-connected classifiers will be handwritten and trained on the output features of the pretrained models. Afterwards, both models are combined with some preprocessing steps to create the entire pipeline.
Related Works
Facial recognition is an area that is well studied with implementations in multiple fields from security systems to camera filters on social media applications. As a result, celebrity facial recognition, like normal facial recognition, has already been explored. In comparison to the work already done, our project is very basic and elementary. Nevertheless, it was still worthwhile to explore the area and to gain an insight and appreciation on the potential of deep learning in facial recognition.
In somewhat recent times, Google has implemented Google Lens, an image recognition technology, and even more recently, it was deployed on Google Images. It can recognize all kinds of objects including celebrity faces, and a quick search of any celebrity and using Google Lens on the image results will give the name and some short background information. It seemed really accurate when we tested it.
Celebrity look-alikes is a fun area of facial recognition, and some companies such as clarifai [1] has already capitalized on it (for profit). The small demonstration on its website seems to be quite accurate with the ability to detect and to classify many faces in one image at varying face sizes.
Much like us, many deep learning enthusiasts have played around with celebrity facial recognition such as [2][3]. Although performance metrics weren't calculated, some of the qualitative examples that they presented showed very good results.
Some people even wrote and published research papers (recent) on this area. In [4], the authors even used some of the same data set as us. However, if one were to read the contents of the paper, they achieved very high accuracy, but it is uncertain how they accomplished the task of predicting celebrity name given that the data set is unlabelled. Nevertheless, this clearly demonstrates that many people have done similar works at a professional (complex) and an amateur (simple) level.
[1] https://www.clarifai.com/models/celebrity-face-recognition
[2] https://towardsdatascience.com/which-celebrity-are-you-d8c6507f21c9
[3] https://medium.com/@gurpreets0610/celebrity-face-recognition-521c2f9bba9
[4] https://www.researchgate.net/publication/327249814_Celebrity_Face_Recognition_using_Deep_Learning
We will first import some necessary libraries. Other libraries will be imported as needed.
# Import libraries
import os
import numpy as np
import torch
import shutil
import PIL
from PIL import Image
import matplotlib.pyplot as plt
from cv2 import cv2
import torchvision
from torchvision import datasets, models, transforms
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1) # Set the random seed
Since we are using Google Colab, we will mount Google Drive to access various files.
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
Google Colab provides (limited) access to a GPU, so we will leverage it to speed up computation.
# GPU
use_cuda = True
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))
We will first take a look at what some of the raw data look like.
# Data loading
data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module 2/'
data_transform = transforms.Compose([transforms.ToTensor()])
raw_data = datasets.ImageFolder(data_dir, transform = data_transform)
A single image of Robert Downey Jr. is shown below.
# Display a sample image
im = PIL.Image.open(data_dir + '/Robert Downey Jr/Robert Downey Jr8_4304.jpg')
print(im.width, im.height, im.mode, im.format, type(im))
display(im)
More images of the Avengers cast are shown below.
In our raw dataset, we have some examples that are not square images, and some other examples that are not oriented properly. We will fix these problems in the Data Preprocessing section below.
# Visualize some of the Avengers cast
classes = ['Anthony Mackie', 'Chris Evans', 'Chris Hemsworth',
'Elizabeth Olsen', 'Jeremy Renner', 'Mark Ruffalo',
'Robert Downey Jr', 'Scarlett Johansson', 'Tom Hiddleston',
'Tom Holland']
data_loader = torch.utils.data.DataLoader(raw_data, batch_size = 20, shuffle = True)
# Obtain one batch of training images
dataiter = iter(data_loader)
images, labels = dataiter.next()
images = images.numpy() # Convert images to numpy for display
# Plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize = (25, 4))
for idx in np.arange(20):
ax = fig.add_subplot(2, 20/2, idx + 1, xticks = [], yticks = [])
plt.imshow(np.transpose(images[idx], (1, 2, 0)))
ax.set_title(classes[labels[idx]])
Since we just want the face and nothing else, we will crop them out, modify their orientations by detecting eye positions, and save them. To do this, the Deepface library is used to detect and to crop faces. This library uses various state-of-the-art models such as Facebook DeepFace and Google FaceNet. We will use the Facebook DeepFace. More information can be found here: https://github.com/serengil/deepface
# Import libraries for face cropping
pip install deepface
pip name_fuinction
from deepface import DeepFace
import unittest
import random
We will preprocess and split the data for model 1 by cropping the faces and saving them in new directories and folders.
First of all, we set the raw data directories and the target directories for our processed data. For our training and validation datasets, we used the data from Pins Face Recognition Dataset [5] and CelebFaces Attributes (CelebA) Dataset [6] from Kaggle. For our test dataset, we gathered some new, never before seen data from Google Images and cropped images from videos. Our training and validation images are in one folder, and our test images are in another folder.
For Model 1, we will process these two folders seperately.
[5] https://www.kaggle.com/hereisburak/pins-face-recognition \ [6] https://www.kaggle.com/jessicali9530/celeba-dataset
# Set up data directory
m1_data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module 1/' # Directory of raw main data
m1_test_data_dir = '/content/drive/My Drive/MIE1517 Project/Test Data/Module 1/' # Directory of raw test data
m1_fc_data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module1/' # New directory for cropped main data
m1_fc_test_data_dir = '/content/drive/My Drive/MIE1517 Project/Test Data/M1_Face_Cropped_Test/' # New directory for cropped test data
m1_fc_data_split_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module1 Data Split/' # New directory for main data split
# Classes for model 1
m1_cls = ['Non-Avengers', 'Avengers']
For the training and validation datasets: we first cropped the faces out from the images, adjusted the faces' orientations according to the detected eyes positions, and then saved them in the target training and validation directory.
# Crop the faces for main data
for n in range (2): # 2 classes
image_list = []
m1_cls_dir = os.path.join(m1_data_dir, m1_cls[n])
# Make folder if it does not exist
if os.path.isdir(m1_fc_data_dir + m1_cls[n]) is False:
os.makedirs(m1_fc_data_dir + m1_cls[n])
# Get the file names
for filename in os.listdir(m1_cls_dir):
image_list.append(filename)
# Load the images
for i in image_list:
img_dir = os.path.join(m1_cls_dir, i)
img = cv2.imread(img_dir)
# Detect faces and write to file
try:
detected_face = DeepFace.detectFace(img)
detected_face = detected_face * 255
cv2.imwrite(m1_fc_data_dir + m1_cls[n] + '/' + i, detected_face[:, :, ::-1])
except ValueError:
pass
After saving the processed images, we splitted the training and validation dataset into two folders with at a ratio of 80-20.
It is worth noting that we have 1594 images in total for the Avengers cast, and 5593 images for the other people. This is because our total number of images is limited by our computation capacity, and there are many more features to learn for our random, non-Avengers images than for our Avengers images. Therefore, to make our model more accurately capture these features with limited capacity, we chose to use more images for our non-Avengers class. We will keep the same number of images for the two classes in our test dataset.
# Split the train/val data set into 2 folders, namely train and val
for cls in m1_cls:
class_dir = m1_fc_data_dir + cls # The folder to look for the images of each letter
img_names = (os.listdir(class_dir)) # Get the file names of the images
random.shuffle(img_names) # Shuffle the names
# Check if there is a train and val folder and make one if not
if os.path.isdir(m1_fc_data_split_dir + '/train/' + cls) is False:
os.makedirs(m1_fc_data_split_dir + '/train/' + cls)
if os.path.isdir(m1_fc_data_split_dir + '/val/' + cls) is False:
os.makedirs(m1_fc_data_split_dir + '/val/' + cls)
# Splitting the data
train_ratio = 0.8
train, val = np.split(img_names, [int(len(img_names)*train_ratio)])
# Get the file names and split into each set
train_names = [class_dir + '/' + name for name in train]
val_names = [class_dir + '/' + name for name in val]
print('For {}:'.format(cls))
print('Total number of images: ', len(img_names))
print('Total number of training images: ', len(train_names))
print('Total number of validation images: ', len(val_names))
print('\n')
# Copy the images to the new folders
for file_name in train_names:
shutil.copy(file_name, m1_fc_data_split_dir + '/train/' + cls)
for file_name in val_names:
shutil.copy(file_name, m1_fc_data_split_dir + '/val/' + cls)
For the test dataset: we first cropped the faces out from the images, adjusted the faces' orientations according to the detected eyes positions, and then saved them in the target test directory.
# Crop the faces for test data
for n in range (2): # 2 classes
image_list = []
m1_cls_dir = os.path.join(m1_test_data_dir, m1_cls[n])
# Make folder if it does not exist
if os.path.isdir(m1_fc_test_data_dir + m1_cls[n]) is False:
os.makedirs(m1_fc_test_data_dir + m1_cls[n])
# Get the file names
for filename in os.listdir(m1_cls_dir):
image_list.append(filename)
# Load the images
for i in image_list:
img_dir = os.path.join(m1_cls_dir, i)
img = cv2.imread(img_dir)
# Detect faces and write to file
try:
detected_face = DeepFace.detectFace(img)
detected_face = detected_face * 255
cv2.imwrite(m1_fc_test_data_dir + m1_cls[n] + '/' + i, detected_face[:, :, ::-1])
except ValueError:
pass
We will preprocess and split the data for Model 2 by cropping the faces and saving them in new directories and folders. The processing procedure for Model 2 is very similar to the one for Model 1.
The only two differences are: (1) for our training and validation data, we only used the images from Pins Face Recognition Dataset, and (2) we used a balanced dataset for our ten Avengers cast members.
# Set up data directory
m2_data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module 2/' # Directory of raw main data
m2_test_data_dir = '/content/drive/My Drive/MIE1517 Project/Test Data/Module 2/' # Directory of raw test data
m2_fc_data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module2/' # New directory for cropped main data
m2_fc_test_data_dir = '/content/drive/My Drive/MIE1517 Project/Test Data/M2_Face_Cropped_Test/' # New directory for cropped test data
m2_fc_data_split_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module2 Data Split/' # New directory for main data split
# Classes for model 2
m2_cls = ['Anthony Mackie/', 'Chris Evans/', 'Chris Hemsworth/', 'Elizabeth Olsen/', 'Jeremy Renner/',
'Mark Ruffalo/', 'Robert Downey Jr/', 'Scarlett Johansson/', 'Tom Hiddleston/', 'Tom Holland/']
# Crop the faces for main data
for n in range (10): # 2 classes
image_list = []
m2_cls_dir = os.path.join(m2_data_dir, m2_cls[n])
# Make folder if it does not exist
if os.path.isdir(m2_fc_data_dir + m2_cls[n]) is False:
os.makedirs(m2_fc_data_dir + m2_cls[n])
# Get the file names
for filename in os.listdir(m2_cls_dir):
image_list.append(filename)
# Load the images
for i in image_list:
img_dir = os.path.join(m2_cls_dir, i)
img = cv2.imread(img_dir)
# Detect faces and write to file
try:
detected_face = DeepFace.detectFace(img)
detected_face = detected_face * 255
cv2.imwrite(m2_fc_data_dir + m2_cls[n]+ i, detected_face[:, :, ::-1])
except ValueError:
pass
# Split train/val data set into 2 folders, namely train and val
for cls in m2_cls:
class_dir = m2_fc_data_dir + cls # The folder to look for the images of each class
img_names = os.listdir(class_dir) # Get the file names of the images
random.shuffle(img_names) # Shuffle the file names
# Check if there is a train and val folder and make one if not
if os.path.isdir(m2_fc_data_split_dir + '/train//' + cls) is False:
os.makedirs(m2_fc_data_split_dir + '/train//' + cls)
if os.path.isdir(m2_fc_data_split_dir + '/val//' + cls) is False:
os.makedirs(m2_fc_data_split_dir + '/val//' + cls)
# Splitting the data
train_ratio = 0.8
train, val = np.split(img_names, [int(len(img_names)*train_ratio)])
# Get the file names and split into each set
train_names = [class_dir + '/' + name for name in train]
val_names = [class_dir + '/' + name for name in val]
print('For {}:'.format(cls))
print('Total number of images: ', len(img_names))
print('Total number of training images: ', len(train_names))
print('Total number of validation images: ', len(val_names))
print('\n')
# Copy the images to the new folders
for file_name in train_names:
shutil.copy(file_name, m2_fc_data_split_dir + '/train//' + cls)
for file_name in val_names:
shutil.copy(file_name, m2_fc_data_split_dir + '/val//' + cls)
# Crop the faces for test data
for n in range (10): # 2 classes
image_list = []
m2_cls_dir = os.path.join(m2_test_data_dir, m2_cls[n])
# Make folder if it does not exist
if os.path.isdir(m2_fc_test_data_dir + m2_cls[n]) is False:
os.makedirs(m2_fc_test_data_dir + m2_cls[n])
# Get the file names
for filename in os.listdir(m2_cls_dir):
image_list.append(filename)
# Load the images
for i in image_list:
img_dir = os.path.join(m2_cls_dir, i)
img = cv2.imread(img_dir)
# Detect faces and write to file
try:
detected_face = DeepFace.detectFace(img)
detected_face = detected_face * 255
cv2.imwrite(m2_fc_test_data_dir + m2_cls[n]+ i, detected_face[:, :, ::-1])
except ValueError:
pass
For the first model, it will distinguish faces of the Avengers cast from other people. The preprocessed data will be loaded and resized to a size of 224 $\times$ 224 pixels, which is required for a pre-trained VGG16 network.
# Data loading & transformation
# Set up data directory
data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module1 Data Split/'
train_dir = os.path.join(data_dir, 'train/')
val_dir = os.path.join(data_dir, 'val/')
# Ensure all images are 224 x 224 by resizing them
data_transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor()])
train_data = datasets.ImageFolder(train_dir, transform = data_transform)
val_data = datasets.ImageFolder(val_dir, transform = data_transform)
# Check number of images
print('Number of training images: ', len(train_data))
print('Number of validation images: ', len(val_data))
The pretrained model that will be used for transfer learning is VGG16. The images will be fed into VGG16 and then the output right before the fully-connected layers will be saved to file to prevent rerunning the code every time.
# Load pretrained VGG16 model
vgg_net = torchvision.models.vgg16(pretrained = True)
# Directory to save the output features
vgg16_model1_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Model1_VGG16/'
if os.path.isdir(vgg16_model1_dir) is False:
os.makedirs(vgg16_model1_dir)
# Saving the output features
def save_features(data,batch_size,data_name):
n = 1
for imgs, labels in torch.utils.data.DataLoader(data, batch_size = batch_size, shuffle = True):
if use_cuda and torch.cuda.is_available(): # GPU
imgs = imgs.cuda()
labels = labels.cuda()
features = vgg_net.features(imgs) # Only use the feature extraction portion of VGG16
# Setting the directory paths
data_path = vgg16_model1_dir + get_data_name(data_name, batch_size, n)
label_path = vgg16_model1_dir + "label_" + get_data_name(data_name, batch_size, n)
original_data_path = vgg16_model1_dir + get_data_name(data_name, batch_size, n) + "original_data"
# Save the features
torch.save(features, data_path)
torch.save(labels, label_path)
torch.save(imgs, original_data_path)
n+=1
def get_data_name(name, batch_size, n):
""" Generate a name for the data with batch size
"""
path = "vgg_16_data_{0}_bs{1}_{2}.pt".format(name, batch_size, n)
return path
Due to the GPU capacity, a batch size of 32 was selected. This batch size worked well in the following model training process.
save_features(train_data, 32, "train")
save_features(val_data, 32, "val")
save_features(test_data, 32, "test")
A custom classifier is written and trained on the output features from the VGG16 feature extractor. There is a lot of flexibility here, but we decided to keep it simple due to computation limitations. The training function is also written along with the accuracy function.
class VGG16Classifier(nn.Module):
def __init__(self):
super(VGG16Classifier, self).__init__()
self.conv1 = nn.Conv2d(512, 50, 2)
self.conv2 = nn.Conv2d(50, 10, 2)
self.fc1 = nn.Linear(250, 50)
self.fc2 = nn.Linear(50, 2)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = x.view(-1, 250)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
# Accuracy function
def get_accuracy(model, data_loader, batch_size, train = False, val = False, test = False):
correct = 0
total = 0
k = 1
# Load saved feature data and labels
for imgs, labels in data_loader:
if train:
file_name = vgg16_model1_dir + get_data_name("train", batch_size, k)
label_name = vgg16_model1_dir + "label_" + get_data_name("train", batch_size, k)
if val:
file_name = vgg16_model1_dir + get_data_name("val", batch_size, k)
label_name = vgg16_model1_dir + "label_" + get_data_name("val", batch_size, k)
if test:
file_name = vgg16_model1_dir + get_data_name("test", batch_size, k)
label_name = vgg16_model1_dir + "label_" + get_data_name("test", batch_size, k)
k += 1
imgs = torch.load(file_name)
labels = torch.load(label_name)
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
#############################################
output = model(imgs)
pred = output.max(1, keepdim = True)[1] # Select index with maximum prediction score
correct += pred.eq(labels.view_as(pred)).sum().item()
total += imgs.shape[0]
return correct / total
# Training function
def train(model, data, batch_size, num_epochs, lr, printed = True, plotted = True):
train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size,
num_workers=0, shuffle=False)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size,
num_workers=0, shuffle=False)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = lr, momentum = 0.9)
iters, losses, train_acc, val_acc = [], [], [], []
# Training
n = 0 # The number of iterations
for epoch in range(num_epochs):
k = 1
print("Training Epoch "+ str(epoch + 1))
for imgs, labels in iter(train_loader):
# Load saved feature data and labels
file_name = vgg16_model1_dir + get_data_name("train", batch_size, k)
label_name = vgg16_model1_dir + "label_" + get_data_name("train", batch_size, k)
k += 1
imgs = torch.load(file_name)
labels = torch.load(label_name)
#############################################
#To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
#############################################
out = model(imgs) # Forward pass
loss = criterion(out, labels) # Compute the total loss
loss.backward() # Backward pass (compute parameter updates)
optimizer.step() # Make the updates for each parameter
optimizer.zero_grad() # Clean up step for PyTorch
# Save the current training information
iters.append(n)
losses.append(float(loss)/batch_size) # Compute *average* loss
training_accuracy = get_accuracy(model, train_loader,batch_size,train=True)
train_acc.append(training_accuracy) # Compute training accuracy
validation_accuracy = get_accuracy(model, val_loader,batch_size,val=True)
val_acc.append(validation_accuracy) # Compute validation accuracy
# Print the training information
if printed:
print("Iteration " + str(n) + ":")
print("CE Loss:" + str(float(loss) / batch_size))
print("Training Accuracy = " + str(training_accuracy))
print("Validation Accuracy = " + str(validation_accuracy))
n += 1
# Save the model
model_path = "vgg16model_bs{0}_lr{1}_epoch{2}".format(batch_size,
lr,
num_epochs)
model_path = vgg16_model1_dir + model_path
torch.save(model.state_dict(), model_path)
# Print learning curves
if plotted:
plt.title("Training Curve")
plt.plot(iters, losses, label = "Train")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()
plt.title("Training Curve")
plt.plot(iters, train_acc, label = "Train")
plt.plot(iters, val_acc, label = "Validation")
plt.xlabel("Iterations")
plt.ylabel("Training Accuracy")
plt.legend(loc = 'best')
plt.show()
print("Final Training Accuracy: {}".format(train_acc[-1]))
print("Final Validation Accuracy: {}".format(val_acc[-1]))
We found that the results are the best when the model is trained using the following parameters:
Batch Size = 32 \ Number of Epoches = 5 \ Learning Rate = 0.005
use_cuda = True
model_32_5_005 = VGG16Classifier()
if use_cuda and torch.cuda.is_available():
model_32_5_005.cuda()
print('CUDA is available! Training on GPU ...')
else:
print('CUDA is not available. Training on CPU ...')
#proper model
print("Training Model:batch_size = 32, num_epochs = 5, lr = 0.005")
train(model_32_5_005, train_data, batch_size = 32, num_epochs = 5, lr = 0.005, printed = False)
Assuming the model has already been trained and saved, it can be loaded directly. The chosen model will be used on the test data.
# Load the saved data
test_loader = torch.utils.data.DataLoader(test_data, batch_size = 32, num_workers = 0, shuffle = False)
# Load the chosen model
trained_model_path = "vgg16model_bs{0}_lr{1}_epoch{2}".format(32, 0.005, 5)
trained_model_path = vgg16_model1_dir + trained_model_path
trained_model = VGG16Classifier()
use_cuda = True
if use_cuda and torch.cuda.is_available():
trained_model = trained_model.cuda()
trained_model.load_state_dict(torch.load(trained_model_path))
A function that prints out our testing results for Model 1 is written below.
# Printing the test results
def print_test_results(model, data_loader, batch_size):
correct, total = 0, 0
k = 1
img_correct_avg, label_correct_avg = [], []
img_correct_non_avg, label_correct_non_avg = [], []
img_notcorrect_avg, label_notcorrect_avg = [], []
img_notcorrect_non_avg, label_notcorrect_non_avg = [], []
# Evaluating model on test data
for imgs, labels in data_loader:
file_name = vgg16_model1_dir + get_data_name("test", batch_size, k)
label_name = vgg16_model1_dir + "label_" + get_data_name("test", batch_size, k)
original_data_name = vgg16_model1_dir + get_data_name("test", batch_size, k) + "original_data"
k += 1
imgs = torch.load(file_name)
labels = torch.load(label_name)
original_imgs = torch.load(original_data_name)
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
original_imgs = original_imgs.cuda()
#############################################
output = model(imgs)
pred = output.max(1, keepdim=True)[1] # Select index with maximum prediction score
correct += pred.eq(labels.view_as(pred)).sum().item()
total += imgs.shape[0]
# Get the accuracies for each class (A better alternative is a confusion matrix)
for i in range(imgs.shape[0]):
if pred.eq(labels.view_as(pred))[i][0]:
if labels[i] == 0:
img_correct_avg.append(original_imgs[i])
label_correct_avg.append(0)
else:
img_correct_non_avg.append(original_imgs[i])
label_correct_non_avg.append(1)
else:
if labels[i] == 0:
img_notcorrect_avg.append(original_imgs[i])
label_notcorrect_avg.append(0)
else:
img_notcorrect_non_avg.append(original_imgs[i])
label_notcorrect_non_avg.append(1)
print("The testing accuracy is: " + str(correct / total))
print(" ")
num_avg_correct = len(label_correct_avg)
num_non_avg_correct = len(label_correct_non_avg)
num_avg_notcorrect = len(label_notcorrect_avg)
num_non_avg_notcorrect = len(label_notcorrect_non_avg)
print(str(num_avg_correct) + " Avengers images were identified correctly.")
print(str(num_avg_notcorrect) + " Avengers images were identified as non-Avengers images.")
print(" ")
print(str(num_non_avg_correct) + " non-Avengers images were identified correctly.")
print(str(num_non_avg_notcorrect) + " non-Avengers images were identified as Avengers images.")
return img_correct_avg, label_correct_avg, img_correct_non_avg, label_correct_non_avg, img_notcorrect_avg, label_notcorrect_avg, img_notcorrect_non_avg, label_notcorrect_non_avg
The quantitative and qualitative results on the test data are shown below. Discussions will be made towards the end of the notebook.
img_correct_avg,label_correct_avg,img_correct_non_avg,label_correct_non_avg,img_notcorrect_avg,label_notcorrect_avg,img_notcorrect_non_avg,label_notcorrect_non_avg = print_test_results(trained_model, test_loader, 32)
# Visualize some correct test results
classes = ['Avengers', 'Non-Avengers']
print("Some examples of Avengers images that were identified correctly:")
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize = (25, 10))
for idx in range(10):
ax = fig.add_subplot(1, 10, idx + 1, xticks = [], yticks = [])
plt.imshow(np.transpose(img_correct_avg[idx].cpu().numpy(), (1, 2, 0)))
ax.set_title(classes[label_correct_avg[idx]])
# Visualize some incorrect test results
print("Some examples of Avengers images that were not identified correctly:")
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize = (25, 10))
for idx in range(10):
ax = fig.add_subplot(1, 10, idx + 1, xticks = [], yticks = [])
plt.imshow(np.transpose(img_notcorrect_avg[idx].cpu().numpy(), (1, 2, 0)))
ax.set_title(classes[label_notcorrect_avg[idx]])
# Visualize some incorrect test results
print("Some examples of non-Avengers images that were not identified correctly:")
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize = (25, 10))
for idx in range(10):
ax = fig.add_subplot(1, 10, idx + 1, xticks = [], yticks = [])
plt.imshow(np.transpose(img_notcorrect_non_avg[idx].cpu().numpy(), (1, 2, 0)))
ax.set_title(classes[label_notcorrect_non_avg[idx]])
For the second model, it will distinguish between faces of different Avengers cast member. The preprocessed data will be loaded and resize them.
# Data loading & transformation
# Set up data directory
data_dir = '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Face Crop Module2 Data Split/'
weight_dir = '/content/drive/My Drive/MIE1517 Project/Saved Weight/M2/'
train_dir = os.path.join(data_dir, 'train/')
val_dir = os.path.join(data_dir, 'val/')
# Ensure all images are 224 x 224 by resizing them
data_transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor()])
train_data = datasets.ImageFolder(train_dir, transform = data_transform)
val_data = datasets.ImageFolder(val_dir, transform = data_transform)
# Check number of images
print('Number of training images: ', len(train_data))
print('Number of validation images: ', len(val_data))
The pretrained model that will be used for transfer learning is ResNet50. The images will be fed into ResNet50 and then the output right before the fully-connected layers will be saved to file to prevent rerunning the code every time.
# Load the pretrained model
resnet50 = torchvision.models.resnet50(pretrained = True)
resnet50
We should use the ResNet50 until the second last layer before the average pool and fully-connected layers. We will assign the ResNet50 layers from 0 to the second last as the feature extractor. In addition, we will freeze the feature extractor's parameters.
# Getting all of the layers before average pooling
modules = list(resnet50.children())[:-2]
resnet50 = nn.Sequential(*modules)
for p in resnet50.parameters():
p.requires_grad = False # Freeze the parameters
# Getting the features
def resnet50_features(data, batch_size, use_cuda = False):
loader = torch.utils.data.DataLoader(data, batch_size = len(data), shuffle = True)
dataiter = iter(loader)
imgs, labels = dataiter.next()
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
resnet50.cuda()
#############################################
features = resnet50(imgs)
print(features.shape, labels.shape)
return features, labels
# Get the features
train_features, train_labels = resnet50_features(train_data, batch_size = len(train_data))
val_features, val_labels = resnet50_features(val_data, batch_size = len(val_data))
# Save computed features and labels
torch.save(train_features, '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/train_features_resnet50.pt')
torch.save(train_labels, '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/train_labels_resnet50.pt')
torch.save(val_features, '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/val_features_resnet50.pt')
torch.save(val_labels, '/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/val_labels_resnet50.pt')
# Load computed features and labels
train_features = torch.load('/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/train_features_resnet50.pt')
train_labels = torch.load('/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/train_labels_resnet50.pt')
val_features = torch.load('/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/val_features_resnet50.pt')
val_labels = torch.load('/content/drive/My Drive/MIE1517 Project/NEW Dataset for Module 1 & 2/Module2_Face_Crop_Resnet/val_labels_resnet50.pt')
A custom classifier is written again and trained on the output features from the ResNet50 feature extractor. There is a lot of flexibility here, but unlike model 1, we went with a more complex model due to the much larger number of classes. The training function is also written along with the accuracy function and check point function.
# Accuracy function
def get_accuracy_class(model, feature_loader, label_loader, use_cuda = False):
correct = 0
total = 0
for feats, labels in zip(feature_loader, label_loader):
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
feats = feats.cuda()
labels = labels.cuda()
model.cuda()
#############################################
output = model(feats)
pred = output.max(1, keepdim=True)[1] # Select index with maximum prediction score
correct += pred.eq(labels.view_as(pred)).sum().item()
total += feats.shape[0]
return correct / total
# Create a path to the model trained using a set of hyperparameters
def check_point(name, batch_size, learning_rate, epoch):
""" Generate a name for the model consisting of all the hyperparameter values
Args:
config: Configuration object containing the hyperparameters
Returns:
path: A string with the hyperparameter name and value concatenated
"""
path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
batch_size,
learning_rate,
epoch)
return path
# Classification architecture and training function
class classification(nn.Module):
def __init__(self):
super(classification, self).__init__()
self.name = 'classification'
self.fc1 = nn.Linear(2048*7*7, 2048)
self.fc2 = nn.Linear(2048, 512)
self.fc3 = nn.Linear(512, 10)
def forward(self, x):
x = x.view(-1, 2048*7*7)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def train_class(model, features, feat_labels, batch_size = 64, num_epochs = 1, learning_rate = 0.01, use_cuda = False, use_val = True):
# Load features
train_loader = torch.utils.data.DataLoader(features, batch_size = batch_size, shuffle = False)
if use_val:
val_loader = torch.utils.data.DataLoader(val_features, batch_size = batch_size, shuffle = False)
# Load labels
train_label_loader = torch.utils.data.DataLoader(feat_labels, batch_size = batch_size, shuffle = False)
if use_val:
val_label_loader = torch.utils.data.DataLoader(val_labels, batch_size = batch_size, shuffle = False)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = learning_rate)
iters, losses, train_acc, val_acc = [], [], [], []
# Training
n = 0 # The number of iterations
for epoch in range(num_epochs):
for feats, labels in zip(train_loader, train_label_loader): # Get the labels and features
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
labels = labels.cuda()
feats = feats.cuda()
model.cuda()
#############################################
out = model(feats) # Forward pass
loss = criterion(out, labels) # Compute the total loss
loss.backward() # Backward pass (compute parameter updates)
optimizer.step() # Make the updates for each parameter
optimizer.zero_grad() # Clean up step for PyTorch
# Save the current training information
iters.append(n)
losses.append(float(loss)/batch_size) # Compute *average* loss
train_acc.append(get_accuracy_class(model, train_loader, train_label_loader, use_cuda)) # Compute training accuracy
if use_val:
val_acc.append(get_accuracy_class(model, val_loader, val_label_loader, use_cuda)) # Compute validation accuracy
n += 1
# Save the current model (checkpoint) to a file every 20 epoch and final epoch
if (epoch+1)%20 == 0:
print('epoch: ', epoch + 1)
model_path = check_point(model.name, batch_size, learning_rate, epoch + 1)
model_path = weight_dir + model_path
torch.save(model.state_dict(), model_path)
elif (epoch+1) == num_epochs:
model_path = check_point(model.name, batch_size, learning_rate, epoch + 1)
model_path = weight_dir + model_path
torch.save(model.state_dict(), model_path)
print('epoch: ', epoch + 1)
# Plot the learning cruves
plt.title("Training Curve")
plt.plot(iters, losses, label="Train")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()
plt.title("Training Curve")
plt.plot(iters, train_acc, label="Train")
if use_val:
plt.plot(iters, val_acc, label="Validation")
plt.xlabel("Iterations")
plt.ylabel("Training Accuracy")
plt.legend(loc='best')
plt.show()
print("Final Training Accuracy: {}".format(train_acc[-1]))
if use_val:
print("Final Validation Accuracy: {}".format(val_acc[-1]))
# Remove any possible connections to weights
train_feats = torch.from_numpy(train_features.cpu().detach().numpy())
val_feats = torch.from_numpy(val_features.cpu().detach().numpy())
We found that the results are the best when the model is trained using the following parameters:
Batch Size = 512 \ Number of Epoches = 100 \ Learning Rate = 0.0005
# Training
model = classification()
train_class(model, train_feats, train_labels, batch_size = 512, num_epochs = 100, learning_rate = 0.0005, use_cuda = use_cuda)
Assuming the model has already been trained and saved, it can be loaded directly. The chosen model will be used on the test data.
# Data loading & transformation of test data
# Set up data directory
dir = '/content/drive/My Drive/MIE1517 Project/Test Data/M2_Face_Cropped_Test/' # Root directory of the test data
# Ensure all images are 224 x 224 by resizing them
data_transform = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor()])
new_data = datasets.ImageFolder(dir, transform = data_transform)
# Check number of test images
print('Number of new images: ', len(new_data))
# Load the images
img_loader = torch.utils.data.DataLoader(new_data, batch_size = len(new_data), shuffle = True)
# Split the data to the images and labels
dataiter = iter(img_loader)
images, name = dataiter.next()
# Compute the features of the test data
def resnet50_test_features(images, name, batch_size, use_cuda = False):
imgs, labels = images, name
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
resnet50.cuda()
#############################################
features = resnet50(imgs)
print(features.shape, labels.shape)
return features, labels
# Compute features for test data using ResNet50 and save them
new_test_features, new_test_labels = resnet50_test_features(images, name, use_cuda)
torch.save(new_test_features, '/content/drive/My Drive/MIE1517 Project/Test Data/Module 2-ResNet50/new_test_features')
torch.save(new_test_labels, '/content/drive/My Drive/MIE1517 Project/Test Data/Module 2-ResNet50/new_test_labels')
# Load test data
new_test_features = torch.load('/content/drive/My Drive/MIE1517 Project/Test Data/Module 2-ResNet50/new_test_features')
new_test_labels = torch.load('/content/drive/My Drive/MIE1517 Project/Test Data/Module 2-ResNet50/new_test_labels')
new_test_loader = torch.utils.data.DataLoader(new_test_features, batch_size = len(new_test_features), shuffle = False)
new_test_label_loader = torch.utils.data.DataLoader(new_test_labels, batch_size = len(new_test_labels), shuffle = False)
# Accuracy of test data
def test_get_accuracy_class(model, feature_loader, label_loader, use_cuda = False):
correct = 0
total = 0
for feats, labels in zip(feature_loader, label_loader):
#############################################
# To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
feats = feats.cuda()
labels = labels.cuda()
model.cuda()
#############################################
output = model(feats)
for i in range(len(output)):
out = output[i].cpu().detach()
prob_dist = torch.softmax(out, dim = 0) # Use softmax to obtain the probabilities
top3_prob, top3_class = torch.topk(prob_dist, 3) # Obtain the largest 3 probabilities and the indices
plt.imshow(np.transpose(images_np[i], (1, 2, 0))) # Show the image from np array
plt.title(classes[name[i]]) # Set the label class as title
plt.axis('off') # Don't show the axis of the plot
plt.show() # Show the image
for n in range(3): # Print the top 3 probabilities and the coresponding class
print(classes[top3_class[n]], ':', top3_prob[n].numpy())
print('\n')
pred = output.max(1, keepdim=True)[1] # Select index with maximum prediction score
correct += pred.eq(labels.view_as(pred)).sum().item()
total += feats.shape[0]
i += 1
return correct / total
The quantitative and qualitative results on the test data are shown below. Discussions will be made towards the end of the notebook.
trained_model = classification()
if use_cuda and torch.cuda.is_available():
trained_model.cuda()
trained_model_path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(trained_model.name, 512, 0.0005, 100)
trained_model_path = weight_dir + trained_model_path
trained_model.load_state_dict(torch.load(trained_model_path))
test_accuracy = test_get_accuracy_class(trained_model, new_test_loader, new_test_label_loader, use_cuda = use_cuda)
print('The test classification accuracy is {:.5f}'.format(test_accuracy))
Now that both models are trained and tested. It is time to combine them into a complete pipeline from start to finish. The objective is to simply input a facial image and get the results without any additional user interaction required. Since the models only recognize faces and the vast majority of pictures contain some kind of background or other objects, an additional step to preprocess the input images is required. We will use a FaceNet-MTCNN face detection model that has been designed for PyTorch as part of this preprocessing step. More information can be found here: https://github.com/timesler/facenet-pytorch
# Load required libraries
pip install facenet-pytorch
pip install mmcv
pip install mtcnn
from facenet_pytorch import MTCNN
from google.colab.patches import cv2_imshow
import mmcv, cv2
To keep things simple and organized, functions containing all of the necessary code to prepare model 1 and model 2 are written. In other words, running the cells below is sufficient to use the pipeline without running the cells in the above sections (assuming the weights are already trained).
# Model 1
def prep_model_1(use_cuda = False):
"""
Prepare model 1.
In:
use_cuda = GPU
Out:
feat_extract_1 = feature extractor (VGG16)
model_1 = trained classifier for Model 1
"""
# Define Classifier_1 architecture (Same as the one in Part 2)
class Classifier_1(nn.Module):
def __init__(self):
super(Classifier_1, self).__init__()
self.conv1 = nn.Conv2d(512, 50,2)
self.conv2 = nn.Conv2d(50, 10,2)
self.fc1 = nn.Linear(250,50)
self.fc2 = nn.Linear(50,2)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = x.view(-1,250)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
# Initiate models
model_1 = Classifier_1()
feat_extract_1 = torchvision.models.vgg16(pretrained = True).features # Just get the feature extraction portion
# GPU
if use_cuda and torch.cuda.is_available():
feat_extract_1 = feat_extract_1.cuda()
model_1 = model_1.cuda()
# Get trained weights
model_1_dir = '/content/drive/My Drive/MIE1517 Project/Saved Weight/M1/'
model_1_weights_dir = 'vgg16model_bs32_lr0.005_epoch5'
model_1_path = model_1_dir + model_1_weights_dir
model_1.load_state_dict(torch.load(model_1_path))
# Set to evaluation mode
feat_extract_1.eval()
model_1.eval()
return feat_extract_1, model_1
# Model 2
def prep_model_2(use_cuda = False):
"""
Prepare model 2.
In:
use_cuda = GPU
Out:
resnet50 = feature extractor (ResNet50)
model_2 = trained classifier for Model 2
"""
# Define Classifier_2 architecture (Same as the one in Part 3)
class Classifier_2(nn.Module):
def __init__(self):
super(Classifier_2, self).__init__()
self.name = 'Classifier_2'
self.fc1 = nn.Linear(2048*7*7, 2048)
self.fc2 = nn.Linear(2048, 512)
self.fc3 = nn.Linear(512, 10)
def forward(self, x):
x = x.view(-1, 2048*7*7)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
# Initiate models
model_2 = Classifier_2()
resnet50 = torchvision.models.resnet50(pretrained = True)
modules = list(resnet50.children())[:-2] # Just get the feature extraction portion
feat_extract_2 = nn.Sequential(*modules)
# GPU
if use_cuda and torch.cuda.is_available():
feat_extract_2 = feat_extract_2.cuda()
model_2 = model_2.cuda()
# Get trained weights
model_2_dir = '/content/drive/My Drive/MIE1517 Project/Saved Weight/M2/'
model_2_weights_dir = 'model_classification_bs512_lr0.0005_epoch100'
model_2_path = model_2_dir + model_2_weights_dir
model_2.load_state_dict(torch.load(model_2_path))
# Set to evaluation mode
feat_extract_2.eval()
model_2.eval()
return feat_extract_2, model_2
Next, a function that combines models 1 and 2. This is essentially the diagram shown at the beginning.
def Predict(img, use_cuda = False):
"""
Combine models 1 and 2.
In:
use_cuda = GPU
img = Facial image
Out:
prediction = String either saying not an avenger cast or the name of avenger cast
"""
# List the Avengers casts
classes = ['Anthony Mackie', 'Chris Evans', 'Chris Hemsworth',
'Elizabeth Olsen', 'Jeremy Renner', 'Mark Ruffalo',
'Robert Downey Jr', 'Scarlett Johansson', 'Tom Hiddleston',
'Tom Holland']
# Ensure image is a Pytorch tensor of 224 x 224 size, and dim order is (C, H, W)
data_transform = transforms.Compose([transforms.ToPILImage(mode = 'RGB'), transforms.ToTensor(), transforms.Resize((224, 224)) ])
img = data_transform(img)
img = img.unsqueeze(0) # Add batch dim to dim 0
# GPU
if use_cuda and torch.cuda.is_available():
img = img.cuda()
feat_extract_1, model_1 = prep_model_1(use_cuda) # Prepare model 1
pred_1 = model_1(feat_extract_1(img)).max(1, keepdim=True)[1]
# If the model predicts the person is an Avengers cast, predict the name
# Otherwise, output that the person is not part of the cast
if pred_1 == 0:
feat_extract_2, model_2 = prep_model_2(use_cuda) # Prepare model 2
pred_2 = model_2(feat_extract_2(img))
# Get top 3 predictions
prediction = []
prob_dist = torch.softmax(pred_2.cpu().detach(), dim = 1) # Use softmax to obtain the probabilities
top3_prob, top3_class = torch.topk(prob_dist, 3) # Obtain the largest 3 probabilities and the indices
for idx in range(3):
prediction.append(classes[int(top3_class.squeeze(0)[idx])])
return prediction
else:
prediction = ["Not an Avengers cast!"]
return prediction
Finally, the face detection preprocessing step is combined to create the entire pipeline.
def Demonstration(dir, use_cuda = True):
"""
Combining everything together.
In:
use_cuda = GPU
dir = Directory of the image to be tested
Out:
The image with a bounding box of the face(s) and the prediction(s)
"""
# Detect face
frame = cv2.imread(dir)
mtcnn = MTCNN(keep_all = True, device = device)
boxes, _ = mtcnn.detect(frame)
frame_draw = frame.copy()
for box in boxes:
bounding_box = []
for i in box.tolist():
bounding_box.append(int(i))
frame_cropped = frame[bounding_box[1]:bounding_box[3], bounding_box[0]:bounding_box[2]] # Crop each face
# Add prediction to each cropped face
prediction = Predict(frame_cropped, use_cuda)
# Draw rectangle along the cropped face and add prediction label
cv2.rectangle(frame_draw,
(box[0], box[1]),
(box[2], box[3]),
(0, 0, 255),
thickness = 2)
# Add the predictions
if len(prediction) == 1: # If the prediction is just 'Not an Avengers cast!'
cv2.putText(frame_draw, prediction[0], (box[0], int(box[3] + 25)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2, cv2.LINE_AA)
else: # If the prediction is 3 names
cv2.putText(frame_draw, prediction[0], (box[0], int(box[3] + 25)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2, cv2.LINE_AA)
cv2.putText(frame_draw, prediction[1], (box[0], int(box[3] + 50)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2, cv2.LINE_AA)
cv2.putText(frame_draw, prediction[2], (box[0], int(box[3] + 75)),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2, cv2.LINE_AA)
cv2_imshow(frame_draw)
Testing on new examples that are not part of the test data.
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/666.jpg'
Demonstration(dir)
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/6666.JPG'
Demonstration(dir)
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/8.jpg'
Demonstration(dir)
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/11.jpg'
Demonstration(dir)
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/000.JPG'
Demonstration(dir)
dir = '/content/drive/My Drive/MIE1517 Project/Demonstration/45.jpg'
Demonstration(dir)
Looking at model 1 by itself, we expected a higher accuracy because there are only 2 classes which is what we got. Some observations that we noticed are the following:
For model 2, we expected a lower accuracy than model 1 because of the large number of classes and smaller data set which is observed by the accuracies. Some specific observations that we made are the following:
Overall, the entire model pipeline worked surprisingly well, but it is clear that there are a plethora of improvements that could be made. While it was able to make accurate predictions for the most part, it would still fail on easy examples. For look-alikes, the model did not do too well (although they are challenging for humans as well).
Through working on this project, we have learned the power of transfer learning and GPU, and the importance of data quality and quantity. Without transfer learning, it is very unlikely that we would get acceptable results. Utilizing a powerful GPU saved many many hours even though Google Colab had usage limits. Lastly, our model will see greater improvements with more high quality data, which is certainly not easy to obtain most of the time. Overall, this has been a great learning experience.
%%shell
jupyter nbconvert --to html MIE1517_Project_Guide.ipynb